Facebook Develops Machine Translation System for 100 Languages

2020-10-22

00:00 / 00:00
复读宝 RABC v8.0beta 复读机按钮使用说明
播放/暂停
停止
播放时:倒退3秒/复读时:回退AB段
播放时:快进3秒/复读时:前进AB段
拖动:改变速度/点击:恢复正常速度1.0
拖动改变复读暂停时间
点击:复读最近5秒/拖动:改变复读次数
设置A点
设置B点
取消复读并清除AB点
播放一行
停止播放
后退一行
前进一行
复读一行
复读多行
变速复读一行
变速复读多行
LRC
TXT
大字
小字
滚动
全页
1
  • Facebook has developed the first machine learning model that can translate between any two of 100 languages without going into English first.
  • 2
  • Facebook says the new multilingual machine translation model was created to help its more than two billion users worldwide.
  • 3
  • The company is still testing the translation system - which it calls M2M-100 - and hopes to add it to different products in the future.
  • 4
  • The social media service says it has made the system open source -- meaning its computer code will be freely available for others to copy or change.
  • 5
  • Angela Fan, a research assistant at Facebook, explained the new machine translation model this week on one of the company's websites.
  • 6
  • She said its development represented a "milestone" in progress after years of "foundational work in machine translation."
  • 7
  • Fan said the model produces better results than other machine learning systems that depend on English to help in the translation process.
  • 8
  • The other systems use it as an intermediate step -- like a bridge -- to translate between two non-English languages.
  • 9
  • One example would be a translation from Chinese to French.
  • 10
  • Fan noted that many machine translation models begin by translating from Chinese to English first, and then from English to French.
  • 11
  • This is done "because English training data is the most widely available," she said.
  • 12
  • But such a method can lead to mistakes in translation.
  • 13
  • "Our model directly trains on Chinese to French data to better preserve meaning," Fan said.
  • 14
  • Facebook said the system outperformed English-centered systems in a widely used system that uses data to measure the quality of machine translations.
  • 15
  • Facebook says about two-thirds of its users communicate in a language other than English.
  • 16
  • The company already carries out an average of 20 billion translations every day on Facebook's News Feed.
  • 17
  • But it faces a huge test with many users publishing massive amounts of content in more than 160 languages.
  • 18
  • The development team trained, or directed, the new model on a data set of 7.5 billion sentence pairs for 100 languages.
  • 19
  • In addition, the system was trained on a total of 2,200 language directions.
  • 20
  • Facebook said this is 10 times the number on the best machine translation models in the past.
  • 21
  • One difficulty the team faced was trying to develop an effective machine translation system for language combinations that are not widely used.
  • 22
  • Facebook calls these "low-resource languages."
  • 23
  • The data used to create the new model was collected from content available on the internet.
  • 24
  • But there is limited internet data on low-resource languages.
  • 25
  • To deal with this problem, Facebook said it used a method called back-translation.
  • 26
  • This method can create "synthetic translations" to increase the amount of data used to train on low-resource languages.
  • 27
  • For now, the company says, it plans to continue exploring new language research methods while working to improve the new model.
  • 28
  • No date has been set for launching the translation system on Facebook.
  • 29
  • But Angela Fan said the new system marks an important step for Facebook, especially for the times we live in.
  • 30
  • "Breaking language barriers through machine language translation is one of the most important ways to bring people together, provide authoritative information on COVID-19, and keep them safe from harmful content," she said.
  • 31
  • I'm Bryan Lynn.
  • 1
  • Facebook has developed the first machine learning model that can translate between any two of 100 languages without going into English first.
  • 2
  • Facebook says the new multilingual machine translation model was created to help its more than two billion users worldwide. The company is still testing the translation system - which it calls M2M-100 - and hopes to add it to different products in the future.
  • 3
  • The social media service says it has made the system open source -- meaning its computer code will be freely available for others to copy or change.
  • 4
  • Angela Fan, a research assistant at Facebook, explained the new machine translation model this week on one of the company's websites. She said its development represented a "milestone" in progress after years of "foundational work in machine translation."
  • 5
  • Fan said the model produces better results than other machine learning systems that depend on English to help in the translation process. The other systems use it as an intermediate step -- like a bridge -- to translate between two non-English languages.
  • 6
  • One example would be a translation from Chinese to French. Fan noted that many machine translation models begin by translating from Chinese to English first, and then from English to French. This is done "because English training data is the most widely available," she said. But such a method can lead to mistakes in translation.
  • 7
  • "Our model directly trains on Chinese to French data to better preserve meaning," Fan said. Facebook said the system outperformed English-centered systems in a widely used system that uses data to measure the quality of machine translations.
  • 8
  • Facebook says about two-thirds of its users communicate in a language other than English. The company already carries out an average of 20 billion translations every day on Facebook's News Feed. But it faces a huge test with many users publishing massive amounts of content in more than 160 languages.
  • 9
  • The development team trained, or directed, the new model on a data set of 7.5 billion sentence pairs for 100 languages. In addition, the system was trained on a total of 2,200 language directions. Facebook said this is 10 times the number on the best machine translation models in the past.
  • 10
  • One difficulty the team faced was trying to develop an effective machine translation system for language combinations that are not widely used. Facebook calls these "low-resource languages." The data used to create the new model was collected from content available on the internet. But there is limited internet data on low-resource languages.
  • 11
  • To deal with this problem, Facebook said it used a method called back-translation. This method can create "synthetic translations" to increase the amount of data used to train on low-resource languages.
  • 12
  • For now, the company says, it plans to continue exploring new language research methods while working to improve the new model. No date has been set for launching the translation system on Facebook.
  • 13
  • But Angela Fan said the new system marks an important step for Facebook, especially for the times we live in. "Breaking language barriers through machine language translation is one of the most important ways to bring people together, provide authoritative information on COVID-19, and keep them safe from harmful content," she said.
  • 14
  • I'm Bryan Lynn.
  • 15
  • Bryan Lynn wrote this story for VOA Learning English, based on reports from Facebook and Agence France-Presse. George Grow was the editor.
  • 16
  • We want to hear from you. Write to us in the Comments section, and visit our Facebook page.
  • 17
  • _______________________________________________________________
  • 18
  • Words in This Story
  • 19
  • translate - v. change written or spoken words from one language to another
  • 20
  • code - n. a set of rules used to instruct computers how to behave or do things
  • 21
  • milestone - n. an event that reaches never before seen levels
  • 22
  • intermediate - adj. between two different stages in a process
  • 23
  • preserve - v. keep something the same or prevent it from being damaged of destroyed
  • 24
  • pair - n. two things that look the same and are used together
  • 25
  • content - n. information contained in a piece of writing, a speech, a movie or on the internet
  • 26
  • synthetic - adj. not made from substances or in the usual way
  • 27
  • authoritative - adj. respected and considered to be accurate